Skip to content

Conversation

@projects-land
Copy link

@projects-land projects-land commented Dec 22, 2025

This change is based on work done by @kprinssu at https://github.com/kprinssu/Kokoro-FastAPI

I don't necessarily expect this PR to get accepted, but I thought I'd throw it out there in case others found it useful.

In order to support AMD gfx1151 GPUS such as Strix Halo, I'm using the latest ROCm release, 7.10.0. This necessitated moving python to 3.12.

In addition, ROCm isn't supported on aarch64, so I need to make some build changes to allow this permutation to only build amd64.

If you are interested in taking this change, I'd be happy to iterate on it, especially if you'd prefer to find a way to keep the cpu & gpu permutations on python 3.10

…Strix Halo

This change uses the latest ROCM 7.10.0 release, which necessitates moving to python 3.12.  ROCM also doesn't support building for aarch64, so some change were made to facilitate this permutation only building on amd64.
@kprinssu
Copy link
Contributor

kprinssu commented Dec 22, 2025

Hey @projects-land, this is tightly coupled to GFX1151b and I think it can be expanded for other AMD GPU architectures.

I am also glad my fork helped other folks, but it would great if you would add attributions to @bgs4free (as they did the leg work for setting up PyTorch and uv) and myself.

I am also planning to upstream most of the changes from my fork. I am hoping to find sometime in the next couple of weeks to do so.

@Temppus
Copy link

Temppus commented Jan 3, 2026

Thanks for the PR. I will try these changes on my pc with AMD Ryzen Al Max+ 395. It would be great to get this support merged.

@worldowner
Copy link

I built the image using your PR, exported it with docker export and tried in systemd-nspawn (I don't have docker/podman installed):

systemd-nspawn -D /home/containers/koroko/ \
  --bind=/dev/dri \
  --bind=/dev/kfd \
  --property="DeviceAllow=char-drm rw" \
  -E PYTHONPATH=/app:/app/api \
  -E USE_GPU=true \
  -E PYTHONUNBUFFERED=1 \
  -E API_LOG_LEVEL=DEBUG \
  -E TORCH_ROCM_AOTRITON_ENABLE_EXPERIMENTAL=1 \
  -E MIOPEN_FIND_MODE=3 \
  -E MIOPEN_FIND_ENFORCE=3 \
  -E UV_LINK_MODE=copy \
  -E PHONEMIZER_ESPEAK_PATH=/usr/bin \
  -E PHONEMIZER_ESPEAK_DATA=/usr/share/espeak-ng-data \
  -E ESPEAK_DATA_PATH=/usr/share/espeak-ng-data \
  -E DEVICE="rocm1151" \
  -E ROCM_PATH=/app/rocm_install \
  -E PATH="/usr/bin:/usr/local/bin:/app/.venv/bin:/app/rocm_install/bin" \
  --system-call-filter=~@ \
  --capability=CAP_SYS_PTRACE \
  --user=appuser \
  --as-pid2 \
  /bin/sh -c 'cd /app && ./entrypoint.sh'

Logs:

Global API loguru logger level: DEBUG
INFO:     Started server process [8]
INFO:     Waiting for application startup.
03:11:30 PM | INFO     | main:62 | Loading TTS model and voice packs...
03:11:30 PM | INFO     | model_manager:38 | Initializing Kokoro V1 on cuda
03:11:30 PM | DEBUG    | paths:101 | Searching for model in path: /app/api/src/models
03:11:30 PM | INFO     | kokoro_v1:46 | Loading Kokoro model on cuda
03:11:30 PM | INFO     | kokoro_v1:47 | Config path: /app/api/src/models/v1_0/config.json
03:11:30 PM | INFO     | kokoro_v1:48 | Model path: /app/api/src/models/v1_0/kokoro-v1_0.pth
WARNING: Defaulting repo_id to hexgrad/Kokoro-82M. Pass repo_id='hexgrad/Kokoro-82M' to suppress this warning.
/app/.venv/lib/python3.12/site-packages/torch/nn/modules/rnn.py:123: UserWarning: dropout option adds dropout after all but last recurrent layer, so non-zero dropout expects num_layers greater than 1, but got dropout=0.2 and num_layers=1
  warnings.warn(
/app/.venv/lib/python3.12/site-packages/torch/nn/utils/weight_norm.py:144: FutureWarning: `torch.nn.utils.weight_norm` is deprecated in favor of `torch.nn.utils.parametrizations.weight_norm`.
  WeightNorm.apply(module, name, dim)
Container koroko failed with error code 139.

Container exits with SIGSEGV and kernel log shows:

python3[12731]: segfault at 34 ip 00007f19108ac99e sp 00007ffeb484e650 error 4 in libhsa-runtime64.so.1[ab99e,7f1910894000+116000] likely on CPU 5 (core 5, socket 0)
Code: b2 0f 00 66 0f 1f 84 00 00 00 00 00 41 57 41 56 41 54 53 48 83 ec 18 48 89 fb 4c 8d bf e8 04 00 00 4c 8b 76 68 4d 39 fe 74 7c <41> 83 7e 34 03 0f 85 84 00 00 00 49 8b 46 20 49 8b 4e 28 41 0f b6

System information:

  • OS: Arch Linux
  • Kernel: 6.18.5
  • linux-firmware: 20260110
  • Framework Desktop with Strix Halo (MAX+ 395 - 128GB)

My system is Arch Linux with kernel 6.18.5 and linux-firmware 20260110.
Is this likely caused by an incompatible or too old ROCm build (more recent nightly needed?), or does it look like another ROCm/runtime issue (possibly related to libhsa-runtime64)?

@worldowner
Copy link

worldowner commented Jan 17, 2026

When I add CUDA_VISIBLE_DEVICES="" it loads on CPU. With previous try I see "Loading Kokoro model on cuda". Am I missing any variable to force it to load model on rocm?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants